Assessing the significance of knockout cascades in metabolic 

networks 

Oriol Guell,^'[2]Francesc Sagues/ Georg Easier,^ Zoran Nikoloski,^ and M. Angeles Serrano^ 

^ Departament de Qmmica Fwica, Universitat de Barcelona, 
Marti i Franques 1, 08028 Barcelona, Spain 
^Systems Biology and Mathematical Modeling, 
Max Planck Institute for Molecular Plant Physiology, 
Am Muhlenberg 1, 144^6 Potsdam, Germany 
^Departament de Fisica Fonamental, Universitat de Barcelona, 
Marti i Franques 1, 08028 Barcelona, Spain 

Abstract 

Complex networks have been shown to be robust against random structural perturbations, but 
vulnerable against targeted attacks. Robustness analysis usually simulates the removal of individual 
or sets of nodes, followed by the assessment of the inflicted damage. For complex metabolic 
networks, it has been suggested that evolutionary pressure may favor robustness against reaction 
removal. However, the removal of a reaction and its impact on the network may as well be 
interpreted as selective regulation of pathway activities, suggesting a tradeoff between the efficiency 
of regulation and vulnerability. Here, we employ a cascading failure algorithm to simulate the 
removal of single and pairs of reactions from the metabolic networks of two organisms, and estimate 
the significance of the results using two different null models: degree preserving and mass-balanced 
randomization. Our analysis suggests that evolutionary pressure promotes larger cascades of non- 
viable reactions, and thus favors the ability of efficient metabolic regulation at the expense of 
robustness. 

Keywords: metabolic networks; robustness; cascading failure; null models 



*E-mail Corresponding Author: oguell@ub.edu 



1 



I. INTRODUCTION 



Complex networks [IHS] may be affected by structural perturbations that modify their 
behavior or even lead to their collapse. Of all potential structural perturbations, the removal 
of individual nodes has received most attention. It has been found that complex networks 
are robust to accidental failures, yet fragile to targeted attacks knocking down their most 
connected nodes |H|5]. A similar approach has been applied in biology to metabolic networks 
which, as compared to other large-scale cellular networks, prove to be of particular interest 
due to the availability of high quality reconstructions based on complementary sources of 
experimental data, and due to the possibility of experimentally validating computational 
predictions [TlHT3] . Metabolic networks, modeled as complex networks [6HTO]. have been 
tested in terms of robustness in a wide variety of in silico experiments. 

Typical in silico strategies to quantify structural robustness of metabolic networks against 
removal of a single reaction consider the downstream effect of the removal and the size of the 
resulting cascade. Properties of the cascade, e.g., size or length, could then be considered 
as proxies for the potential damage inflicted by the removal. Finally, one would like to 
obtain estimates for the statistical significance of the obtained observations, which could be 
empirically carried out with respect to a well-chosen null model. For instance, by applying 
this strategy in combination with degree preserving randomization (see Sections 2.2 and 
2.3), Smart and coworkers p3j have determined that bacterial organisms may have evolved 
towards reducing the probability of having large cascades, thus, increasing robustness. 

Recently, more complicated perturbations have also been considered, namely, removal of 
pairs of reactions and sets of genes [15], and the topological significance of the results was 
evaluated with respect to degree preserving randomization. The employed degree preserving 
null model has its roots in the Configuration Model [16] for bipartite networks [T714T9] , that 
results in randomized network variants in which node degrees are preserved. However, such 
a degree preserving randomization does not account for the most basic physico-chemical 
constraints, and may lead, in the case of metabolic networks, to consideration of a reaction 
which is not mass (i.e., stoichiometrically) balanced (a reaction which does not preserve 
the same type and number of atoms on its substrate and product sides). As a result, the 
randomized networks may not be chemically feasible. As an alternative, a novel null model 
called mass-balanced randomization has been recently proposed [20] to account for this issue. 
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Mass-balanced randomization results in randomized network variants in which: (1) every 
reaction is mass balanced and (2) degrees of reactions are unaltered. By comparing the 
differences between the analyzed property in the original network and in its randomized 
variant, this null model is able to distinguish between properties which result solely from 
physico-chemical constraints and those selected for by evolutionary pressure [2T] . 

Here we explore the metabolic networks of Staphylococcus aureus fl^ and Escherichia 
coli [22] to determine the extent to which the robustness against the failure of individual 
reactions and pairs of reactions, as quantified by the number of non-viable reactions caused 
upon the initial removal(s) of a single reaction or a pair of reactions, is bounded by structural 
constraints. To this end, we compare cascades in the original networks with those obtained 
from the two null models referred above: degree preserving (DP) and mass-balanced (MB) 
randomization. We find that the two null models give very different results, which is ex- 
plained in terms of the properties of the networks obtained by both randomization methods. 
We use Kolmogorov-Smirnov tests [23] to statistically assess whether the null models are 
enough to explain the resulting damage distributions in the original networks. Interestingly, 
we find that the two organisms exhibit cascades whose properties lie between those expected 
from the two considered null models, which suggest that factors other than node degrees 
or physical principles affect the considered features. Moreover, our findings point out that, 
in the analyzed metabolic networks, evolutionary pressure may not have lead towards min- 
imized damage spreading, which opposes earlier findings based solely on degree preserving 
randomization. Our results reinforce the importance of choosing an appropriate null model 
according to the question at hand, since the null model ultimately affects the interpretation 
of the findings. 



II. METHODS 



A. Representation of metabolic networks as complex networks 

We have modeled metabolic networks as bipartite graphs [10], whereby the set of nodes is 
portioned into metabolites and reactions, and there are no links between nodes of the same 
type. Metabolites are connected to the reactions by directed links, which allow differentiating 
between reactants and products. Every reversible reaction is split into two irreversible 
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reaction nodes, so that every reaction node included in the representation corresponds to an 
irreversible reaction. 

We use two different bacterial organisms in this study. E. coli is the most studied prokary- 
otic organism and it is the bacterial model which is most frequently used due to ease of 
experimental manipulation. To construct the bipartite representation of the metabolic net- 
work of E. coli [22], we use data from the BiGG database[25]. The resulting network contains 
Nji = 2066 reactions and Nm = 1667 metabolites. 

S. aureus is an anaerobic bacterium which is present world-wide. Its bipartite directed 
network reconstruction was obtained from [H], and consists of Nji = 640 reactions and 
Nm = 644 metabolites. 

The networks of the two analyzed species show marked similarities with respect to the 
cumulative degree distributions. For metabolites, the degree distribution shows a typical 
power-law form, P{k) ~ k~'^, with an exponent 7 = 2.13 for S. aureus and 7 = 2.09 for E. 
coli. On the contrary, reactions show a peaked distribution centered at the average degree 
corresponding to each network, having a value of < >= 4.8 for S. aureus and a value of 
< kji>= A.3 for E. coli. 

B. Cascading failure algorithm 

A cascading failure algorithm [Tl] is applied to spread the initial perturbation through 
the network and to compute the corresponding damage. Crucial to the algorithm are the 
concepts of viable metabolites and reactions. A metabolite is considered viable if it has at 
least one incoming and one outgoing connection, so as to prevent depletion or accumulation. 
This structural condition is a prerequisite for the network as a whole to operate at a positive 
steady state. On the other hand, reactions are viable if and only if all of the participating 
metabolites are viable. The algorithm then starts with a network from which an initial 
set of reactions is removed. In the following step, all reactions and metabolites that, as a 
consequence, become non-viable are removed, which in its turn results in additional changes 
of the viability status of the nodes. When only viable reactions and metabolites remain 
in the network, the damage inflicted by the initial perturbation is quantified as the final 
number of non- viable reactions (see Figure [T]). Here, we consider perturbations by initially 
removing every single reaction and each pair of reactions. 
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FIG. 1: Example of the application of the cascading failure algorithm to a metabolic network, a) 
For clarity, metabolites 4 and 5 are labeled with R and 7 and 8 with P depending on whether 
they are reactants or products of the reversible reaction. The cascade starts when reaction Rc 
fails, b) Therefore, metabolites 3 and 6 become non-viable. Because metabolite 6 is connected 
to reaction Rg, the later becomes non-viable, turning also metabolite 12 non-viable. Notice that 
metabolite 11 loses one in connection, but it is still viable, meaning that one of the waves of the 
cascade stops here. However the other wave keeps spreading, c) Metabolite 5 becomes inviable, 
causing the reversible reaction Rd to remain viable only towards the production of metabolites 
7 and 8. e) Consequently, metabolite 4 becomes non-viable, and so its associated reactions also 
become non-viable. The cascade spreads until all metabolites and reactions affected by the cascade 
remain viable. Finally, note that metabolites 1, 2, 3, 13, and 14, which initially have no incoming 
or outgoing connections, are not considered inviable by the algorithm. 



5 



Benchmark cascades are computed by first randomizing the original network and then 
performing the cascade algorithm on the randomized version. For each of the two null 
models, damage distributions are obtained for a hundred realizations, then averaged, and 
finally compared to the original damage distribution to assess the statistical significance and 
quantitative behavior of the observation. 

C. Degree preserving randomization 

The degree preserving randomization method approximates the Configuration Model for 
bipartite networks [T6HT9] and works as follows. A pair of links of the network is chosen at 
random and their targets are swapped, unless this would lead to the repeated occurrence of 
a metabolite in a reaction. The randomization algorithm is summarized as follows: 

1. Pick two links at random: mi — )■ ri and m2 — ?• r2 or ri — )■ mi and r2 — )■ m2. 

2. Swap the end of the links avoiding repeated links and self-production: (mi — )■ r2 and 
m2 — )■ Ti or ri — )■ m2 and r2 — )■ mi). 

3. Repeat until we perform swappings, where L is the total number of links. 

4. Make several realizations of the randomized metabolic network following the three 
previous steps. 

The links from reversible reactions are rewired independently of those from irreversible 
reactions, using the same steps previously mentioned, in order to preserve the degrees of 
metabolites corresponding to reversible and irreversible reactions, respectively. A scheme 
showing this algorithm is shown in Figure [2j It is easy to show that the networks obtained 
using this method preserve the degrees of both metabolites and reactions, as illustrated in 
Figure |3| 

D. Mass-balanced randomization 

Mass-balanced randomization generates randomized networks by rewiring the links cor- 
responding to substrate-reaction or product-reaction relationships, while preserving atomic 
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1. An edge pair is chosen uniformly at random 
from the network 



Glucose 



Fructose 
C6H12O6 




Dihydroxyacetone 



Pyruvate 







Phosphoenol pyruvate 



Dihydroxyacetone- 
phosphate 



i 



Degree preserving 
randomization 



Glucose 



Dihydroxyacetone- 
phosphate 
C3H5O6P 




Dihydroxyacetone 



Pyruvate 






Phosphoenol pyruvate 



Fructose 



2. The targets of the edges are switched. 

The reactions and compounds have the same in- 

and out-degrees. 



1. A reaction is chosen uniformly at random 
from the network 



Dihydroxyacetone 





Phosphoenol pyruvate 
C3HAP 



i 



3 Glycolaldehyde 





2 Phosphoenolpyruvate 
C3HAP 



Pyruvate 





Dihydroxyacetone- 
phosphate 



Mass-balanced 
randomization 



2 Pyruvate 





2 Dihydroxyacetone- 
phosphate 



2. A compound is replaced and the stoichiometric 
coefficients recalculated. 

The resulting reaction has the same in- and out- 
degree and is mass balanced: 
3C,Hfi, + 2C3HAP -> 2C3H3O3 + 2C3H AP 



FIG. 2: Left: Scheme of the degree preserving randomization algorithm. In- and out-degrees are 
conserved, but mass balance is not satisfied. Right: Scheme of the mass-balanced randomization. 
In this case metabolites are switched only if the new reaction is mass balanced; while reaction 
degrees are kept constant, the degrees of metabolites are not preserved. 
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mass balance of the reactions |20]. Given a reaction r, its atomic mass balance is given by: 



where Er is the set of substrates and Pr the set of products in r, me,mp are the vectors 
of sum formulas of e and p, respectively, and Se,r, Sp^r their stoichiometric coefficients. For 
instance, consider the reaction A ^ B, with = tub = C6H12O6. Then, A may be 
substituted by a compound C with mc = CsHgOs from within the network, resulting in 
the randomized reaction 2 C ^ B, which satisfies Equation [T] since 2 C3H6O3 = C6H12O6 
(Figure [2]). In addition to substituting individual substrates or products, the method also 
allows more complex substitutions involving pairs of substrates or products, yielding a large 
number of possible substitutions. 

The motivation for preserving atomic mass balance of reactions, a fundamental physico- 
chemical constraint, is that the resulting null model allows estimating the importance of 
network properties with respect to evolutionary pressure. As biological systems and their 
properties evolve under physical constraints and evolutionary pressure, a null model which 
satisfies physical principles but does not account for evolutionary pressure differs from a 
metabolic network only in the properties which are affected by evolutionary pressure. Thus, 
a property deemed statistically significant following mass-balanced randomization is beyond 
basic physical constraints and likely to be a result of evolutionary pressure [21]. Again, a hun- 
dred randomized networks are generated from a given metabolic network. The method pre- 
serves mass balance and reaction degrees, while the stoichiometric coefficients and metabolite 
degrees are changed (Figure [3^, b). 

III. RESULTS 

A. Individual removal of reactions 

First, we study cascades triggered by individual removal of reactions, from r = 1 to 
r = Nr. We define the size of a cascade, dr, or damage caused by the removal of a reaction 
r as the number of resulting non- viable reactions. 

The cumulative probability distributions P{d'^ > dj.) give the probabilities that a damage 
is at least as large as dr, and may be interpreted as a measure of the robustness of a network 
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FIG. 3: Comparison of the degrees of reactions and metabolites obtained by the two nuU models 
applied to the network of E. coli. In this representation, each point is a reaction or a metabolite with 
coordinates (kreahkrandomized) , where kreal is the metabolite/reaction degree in the original network, 
and krandomized the Corresponding degree in a randomized network. Points fall in the diagonal if 
degrees are preserved in the randomized networks, a) Mass-balanced randomization (MB). This 
method gives networks in which the degrees of the reactions are preserved. However, b) degrees 
of metabolites are not conserved, c) Degree preserving randomization (DP). This method gives 
networks with preserved degrees of reactions, d) Degrees of metabolites are also preserved with 
degree preserving randomization, however at the expense of violating mass balances of reactions. 

with respect to reaction removal. We determined the cumulative probability distributions 
of damage in the two metabolic networks under analysis in this study and compared them 
to the averaged distributions associated to their randomized variants from degree preserving 
and mass-balanced randomization (Figure |4]). 

All distributions show a similar power-law distribution, with an exponential cut-off in- 
dicating possible finite size effects. The detected heterogeneity indicates that the failure of 
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FIG. 4: Distributions of damage caused by removal of reactions, a-b) Cumulative probability dis- 
tributions for S. aureus (green) and E. coli (red). Averaged distributions over 100 randomizations 
of the original networks are shown for degree preserved (dashed line) and mass-balanced random- 
ization (continuous line), c-d) Damages caused by pairs of removal of reactions, with the same 
symbols as in a-b). 

most reactions results in small damage, while some specific reactions affect large parts of 
the corresponding network, which is in agreement with the robustness previously attributed 
to metabolic networks [5J. However, here we observe that the same trend also holds for 
the randomized networks, both from degree preserving and mass-balanced randomization. 
Thus, the heterogeneous behavior of the damage distributions is a general feature which is 
not specific to metabolic networks, and thus insufficient to explain robustness of metabolism. 

Instead, when comparing the distributions of damages between the original and random- 
ized networks, we observe that the distributions of the original networks lie in between the 
distributions of the two null models. Particularly, in both organisms, damages are smaller 
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Organism 


SR 


PR 


MB 


DP 


MB 


DP 


S. aureus 
E. coll 


0.19/0 
0.19/0 


0.085/0.0002 
0.082/2 • 10^12 


0.27/0 
0.21/0 


0.15/0 
0.13/0 



TABLE I: Kolmogorov-Smirnov tests for comparing single reaction (SR) and pairs of reactions (PR) 
failme cascades in the two metabolic networks with both randomization methods, mass-balanced 
(MB) and degree preserving (DP). We give the values of the KS statistic / associated significance 
level. 

compared to their degree preserving randomizations, but larger when compared to their 
mass-balanced randomizations. Thus, the robustness of the analyzed networks cannot be 
explained by the distribution of degrees or by basic physical constraints. For the DP null 
model, this finding indicates that robustness is positively influenced by factors which are 
independent of the degrees. The results from the MB null model suggest that evolutionary 
pressure leads to larger cascades of non-viable reactions, and thus lower robustness (see 
Discussion). 

At first glance, the damage distributions of the original networks seem to closely resemble 
those of the mass-balanced randomized networks. To check whether or not the cumulative 
probability distributions significantly differ between the original networks and their random- 
ized variants, we perform Kolmogorov-Smirnov tests \^^\ (Table [l]). The null hypothesis is 
that the two samples are drawn from the same distribution, which is rejected if the ob- 
tained p-value is no greater than the standard significance level a = 0.05. In this case, the 
compared distributions are considered significantly different. 



B. Failure of pairs of reactions 

In the following, we analyze failure cascades resulting from the removal of each possible 
pair of reactions, where both r and r' run from r, r' = 1 to r,r' = Nji (r 7^ r'). Similar to 
the previous section, we determine the cumulative probability distributions P{d[.^, > drr') 
of the damage drr' resulting from the knockout of two reactions r and r', which provides a 
measure for the robustness of a network with respect to the failure of reaction pairs (Figure 
a. 
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Both organisms E. coli and S. aureus display again similar results: the distributions of 
damages in the real networks show a power-law behavior followed by an exponential cut- 
off. However, in this case the transition is more accentuated. The distributions of the 
original networks lie again between the distributions of the two null models. Consequently, 
our observations from Section 3.1 also hold for the failure of reaction pairs: robustness is 
positively influenced by factors independent of the degrees, but negatively influenced by 
evolutionary pressure. 

Kolmogorov-Smirnov tests were applied in order to test if the distributions given by the 
null models differ from those in the original network (Table |l]). As for individual removal of 
reactions, we chose a threshold value of a = 0.05. Again, the distributions of the original 
networks are significantly different from those of both randomization methods, with values 
of 0. 

IV. DISCUSSION 

The obtained results reveal important findings by comparing the damage caused by the 
failure of single and pairs of reactions in the original metabolic networks to those measured 
in the randomized variants. 

The cascade algorithm produces larger damages in the original networks as compared to 
those in mass-balanced randomized networks, but smaller cascades as compared to those 
in degree preserving randomized counterparts. A possible explanation is offered by the 
difference in global properties of the networks obtained from the two randomization methods 
|21j . Degree preserving randomization decreases the average path length and increases 
the clustering coefficient of the randomized network, increasing its 'small-world' property. 
Consequently, such networks are more interconnected, and thus a cascade may in principle 
propagate further in the network. The opposite holds for mass-balanced randomization, 
which increases the average path length while decreasing the clustering coefficient of the 
randomized network, so that the spread of the damage is less likely. Although the average 
path length does not resemble the length of metabolic inter-conversion, the small-world 
property may still affect the impact of removal of reactions due to its functional importance. 

In Tables m the significance levels obtained by the Kolmogorov-Smirnov test show that all 
distributions obtained from the null models are significantly different from those observed 
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in the original networks. The distributions, though visually similar to the original networks, 
differ significantly between the original and randomized networks. The fact that the damage 
distributions of the original networks differ significantly from those in the degree preserved 
randomized versions allow us to conclude that other properties beyond the degrees are 
important for determining the size of cascades in the original networks. 

We also point out that the principle of cascade propagation relies on violation of a struc- 
tural precondition for a steady-state, namely that all metabolites can be produced and 
consumed in order to avoid their depletion or accumulation. However, the steady-state 
assumption is only meaningful for networks which satisfy fundamental physical principles. 
We therefore also employed mass-balanced randomization, which guarantees preservation of 
mass-balance, and thus allows us to discern whether the measured property is a result of 
basic physical principles, or, instead, whether it is affected by evolutionary pressure. Since 
the size of cascades in mass-balanced randomized networks is significantly lower than those 
in real networks, evolutionary pressure may indeed lead to larger cascades. 

Consequently, this finding indicates that evolutionary pressure may favor lower robustness 
of metabolic networks with respect to the failure of reactions, seemingly contradicting the 
general requirement of robustness in biological systems. However, on one hand, this finding 
may be a result of the evolutionary versatility of metabolic networks, which favors organisms 
that are able to evolve quickly, i.e., by few modifications to their metabolic networks. On 
the other hand, we point out that a cascade may not only be interpreted as the harmful 
spreading of failure, but also as the ability to regulate metabolism by activating/deactivating 
reactions, e.g., by transcriptional regulation pi]. Thus, large cascades, favored by evolu- 
tionary pressure, may point at the evolutionary requirement of regulating large parts of 
metabolism, such as pathways, through the regulation of small sets of enzyme-coding genes. 
The ability to regulate the activity of metabolic reactions by deactivating competing reac- 
tions is a well-known principle of metabolism. Our results thus indicate that evolutionary 
pressure favors the ability of efficient metabolic regulation at the expense of robustness 
to gene knockouts, pointing at the necessary integration of trade-offs from various cellular 
functions. 
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